Chameleon: Hierarchical Clustering Using Dynamic Modeling

نویسندگان

  • George Karypis
  • Eui-Hong Han
  • Vipin Kumar
چکیده

68 Computer C lustering is a discovery process in data mining. 1 It groups a set of data in a way that maximizes the similarity within clusters and minimizes the similarity between two different clusters. 1,2 These discovered clusters can help explain the characteristics of the underlying data distribution and serve as the foundation for other data mining and analysis techniques. Clustering is useful in characterizing customer groups based on purchasing patterns, categorizing Web documents, 3 grouping genes and proteins that have similar func-tionality, 4 grouping spatial locations prone to earthquakes based on seismological data, and so on. Most existing clustering algorithms find clusters that fit some static model. Although effective in some cases, these algorithms can break down—that is, cluster the data incorrectly—if the user doesn't select appropriate static-model parameters. Or sometimes the model cannot adequately capture the clusters' characteristics. Most of these algorithms break down when the data contains clusters of diverse shapes, densities , and sizes. Existing algorithms use a static model of the clusters and do not use information about the nature of individual clusters as they are merged. Furthermore, one set of schemes (the CURE algorithm and related schemes) ignores the information about the aggregate interconnectivity of items in two clusters. The other set of schemes (the Rock algorithm, group averaging method, and related schemes) ignores information about the closeness of two clusters as defined by the similarity of the closest items across two clusters. (For more information, see the " Limitations of Traditional Clustering Algorithms " sidebar.) By only considering either interconnectivity or close-ness, these algorithms can easily select and merge the wrong pair of clusters. For instance, an algorithm that focuses only on the closeness of two clusters will incorrectly merge the clusters in Figure 1a over those in Figure 1b. Similarly, an algorithm that focuses only on interconnectivity will, in Figure 2, incorrectly merge the dark-blue with the red cluster rather than the green one. Here, we assume that the aggregate interconnec-tivity between the items in the dark-blue and red clusters is greater than that of the dark-blue and green clusters. However, the border points of the dark-blue cluster are much closer to those of the green cluster than those of the red cluster. Chameleon is a new agglomerative hierarchical clustering algorithm that overcomes the limitations of existing clustering algorithms. Figure 3 (on page 70) provides an overview of the overall approach …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling

Clustering in data mining is a discovery process that groups a set of data such that the intracluster similarity is maximized and the intercluster similarity is minimized. Existing clustering algorithms, such as K -means, PAM, CLARANS, DBSCAN, CURE, and ROCK are designed to find clusters that fit some static models. These algorithms can breakdown if the choice of parameters in the static model ...

متن کامل

The New Software Package for Dynamic Hierarchical Clustering for Circles Types of Shapes

In data mining, efforts have focused on finding methods for efficient and effective cluster analysis in large databases. Active themes of research focus on the scalability of clustering methods, the effectiveness of methods for clustering complex shapes and types of data, high-dimensional clustering techniques, and methods for clustering mixed numerical and categorical data in large databases. ...

متن کامل

A Modified Multilevel Approach to the Dynamic Hierarchical Clustering for Complex types of Shapes

In data mining, efforts have focused on finding methods for efficient and effective cluster analysis in large databases. Active themes of research focus on the scalability of clustering methods, the effectiveness of methods for clustering complex shapes and types of data, high-dimensional clustering techniques, and methods for clustering mixed numerical and categorical data in large databases. ...

متن کامل

Parallel Algorithm for the Chameleon Clustering Algorithm using Dynamic Modeling

With the increasing size of data-sets in application areas like bio-medical, hospitals, information systems, scientific data processing and predictions, finance analytics, communications, retail and marketing, it is becoming increasingly important to execute data mining tasks in parallel. At the same time, technological advancements have made shared memoryparallel computation machines commonly ...

متن کامل

Hierarchical Clustering Algorithms in Data Mining

Clustering is a process of grouping objects and data into groups of clusters to ensure that data objects from the same cluster are identical to each other. Clustering algorithms in one of the area in data mining and it can be classified into partition, hierarchical, density based and grid based. Therefore, in this paper we do survey and review four major hierarchical clustering algorithms calle...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Computer

دوره 32  شماره 

صفحات  -

تاریخ انتشار 1999